\section{The referring expression generation algorithm} \label{sec:algorithm}

Refinement algorithms for GRE are based on the following basic idea: given a 
scene $S$, the objects appearing in $S$ are successively classified according
to their properties into finer and finer classes. A description (in some formal language $\mathcal{L}$) of each 
class is computed every time a class is refined. The procedure always stops when the 
set of classes stabilizes, i.e., no further refinement is possible with the information 
available in the scene\footnote{Of course, if we are only interested in a referring expression for 
a given target we can stop the procedure as soon as the target is the only element of some of the classes.}.  If the target element is in a singleton class, then the formal description of that class is a referring expression; otherwise the 
target cannot be unequivocally described (in $\mathcal{L}$).  

We present a modification of the algorithm in~\cite{arec2:2008:Areces} where the fixed order of properties in the 
input scene is replaced by a finite probability distribution. 
%Required changes
%are fairly straightforward (see Figure~\ref{fig:algo3}), but 
%the behavior of the resulting algorithm is strikingly changed. To start with, 
The resulting algorithm (see Figure~\ref{fig:algo3}) is now non-deterministic: two runs of the algorithm with the same 
input might result in different REs for objects in the scene.
The input to the algorithm will be a relational model $\mathcal{M} = \tup{\Delta, \interp{\cdot}}$,
where $\Delta$ is the non-empty domain of objects in the scene, and $\interp{\cdot}$ is an 
interpretation function that assigns to all properties in the scene its intended extension.  For example, 
the scene shown in Figure~\ref{Tuna-scene} could be represented by the model $\gM=\tup{\Delta,\interp{\cdot}}$ shown in Figure~\ref{TUNA-scene-graph}. In Figure~\ref{TUNA-scene-graph}, $\Delta = \{e_1,\ldots,e_7\}$, and for example the extension of blue is $\interp{\emph{blue}} = \{e_5,e_6,e_7\}$ because 3 objects are blue in the scene. In the Figure, xn indicates that the object is in position n with regard to its x-dimension in the grid and yn is interpreted similarly. 
%A scene can be encoded in different ways as a relational model. The algorithm assumes that these issues have been resolved and that the model encodes a suitable representation of the scene we want to describe. We also assume that all relations are \emph{binary} (relations of other arity can be encoded as binary).

\newcommand{\nChair}{\mathit{chair}\xspace}
\newcommand{\nFan}{\mathit{fan}\xspace}
\newcommand{\nGray}{\mathit{gray}\xspace}
\newcommand{\nBlue}{\mathit{blue}\xspace}
\newcommand{\nGreen}{\mathit{green}\xspace}
\newcommand{\nLeft}{\mathit{left}\xspace}
\newcommand{\nRight}{\mathit{right}\xspace}
\newcommand{\nSmall}{\mathit{small}\xspace}
\newcommand{\nLarge}{\mathit{large}\xspace}
\newcommand{\nBig}{\mathit{big}\xspace}
\newcommand{\nEquisuno}{\mathit{x1}\xspace}
\newcommand{\nEquisdos}{\mathit{x2}\xspace}
\newcommand{\nEquistres}{\mathit{x3}\xspace}
\newcommand{\nEquiscuatro}{\mathit{x4}\xspace}
\newcommand{\nEquiscinco}{\mathit{x5}\xspace}

\newcommand{\nYuno}{\mathit{y1}\xspace}
\newcommand{\nYdos}{\mathit{y2}\xspace}
\newcommand{\nYtres}{\mathit{y3}\xspace}
\newcommand{\nFront}{\mathit{front}\xspace}
\newcommand{\nBack}{\mathit{back}\xspace}

\newcommand{\nBall}{\mathit{ball}\xspace}
\newcommand{\nCube}{\mathit{cube}\xspace}
\newcommand{\nOntop}{\mathit{ontop}\xspace}
\newcommand{\nTop}{\mathit{top}\xspace}
\newcommand{\nBelow}{\mathit{below}\xspace}
\newcommand{\nRightof}{\mathit{rightof}\xspace}
\newcommand{\nLeftof}{\mathit{leftof}\xspace}
\vspace*{-.5cm}
\begin{figure}[ht]
\begin{minipage}[b]{0.5\linewidth}
\centering
\includegraphics[width=\textwidth]{images/tuna.jpg}
\vspace*{-.25cm}
\vspace*{-.4cm}\caption{Scene, target \emph{blue chair facing left}}
\label{Tuna-scene}
\end{minipage}
\hspace*{-0.2cm}
\begin{minipage}[b]{0.5\linewidth}
\centering
\begin{tikzpicture}
  [
    n/.style={circle,fill,draw,inner sep=3pt,node distance=1.4cm},
    aArrow/.style={->, >=stealth, semithick, shorten <= 2pt, shorten >= 2pt},
  ]
 \node[n,label=above:$e_4$,label=below:{
    \relsize{-1}$\begin{array}{c}
      \nFront \\[-2pt]
      \nSmall \\[-2pt]
      \nGray \\[-2pt]
      \nEquisuno \\[-2pt]
      \nYdos \\[-2pt]
      \nFan \\ \end{array}$}] (d) {};

 \node[n,label=above:$e_5$,label=below:{
    \relsize{-1}$\begin{array}{c}
      \nBack \\[-2pt]
      \nLarge \\[-2pt]
      \nBlue \\[-2pt]
      \nEquisdos \\[-2pt]
      \nYdos \\[-2pt]
      \nChair \\\end{array}$}, right of=d] (e) {};

 \node[n,label=above:$e_6$,label=below:{
    \relsize{-1}$\begin{array}{c}
      \nBack \\[-2pt]
      \nLarge \\[-2pt]
      \nBlue \\[-2pt]
      \nEquiscuatro \\[-2pt]
      \nYtres \\[-2pt]
      \nFan\\ \end{array}$}, right of=e] (f) {};

 \node[n,label=above:$e_7$,label=below:{
    \relsize{-1}$\begin{array}{c}
      \nLeft \\[-2pt]
      \nLarge \\[-2pt]
      \nBlue \\[-2pt]
      \nEquiscinco \\[-2pt]
      \nYtres \\[-2pt]
      \nChair\end{array}$}, right of=f] (g) {};
\node[n,label=above:$e_1$,label=right:{
    \relsize{-1}$\begin{array}{c}
      \nLeft\\[-2pt]
      \nLarge\\[-2pt] 
      \nGreen \\[-2pt]
      \nEquisdos \\[-2pt]
      \nYuno \\[-2pt]
      \nFan\end{array}$}, above of=d] (a) {};

 \node[n,label=above:$e_2$,label=right:{
    \relsize{-1}$\begin{array}{c}
      \nLeft\\[-2pt]
      \nLarge\\[-2pt] 
      \nGreen \\[-2pt]
      \nEquiscuatro \\[-2pt]
      \nYuno \\[-2pt]
      \nChair\end{array}$}, above of=e] (b) {};

 \node[n,label=above:$e_3$,label=right:{
    \relsize{-1}$\begin{array}{c}
      \nFront \\[-2pt]
      \nSmall \\[-2pt]
      \nGray \\[-2pt]
      \nEquiscinco \\[-2pt]
      \nYuno \\[-2pt]
      \nChair\end{array}$}, above of=f] (c) {};
 \draw[dotted] (-0.5,-2.0) rectangle (5.0,2.5);

 \end{tikzpicture}
\vspace*{-.4cm}\caption{The scene as a relational model}
\label{TUNA-scene-graph}
\end{minipage}
\end{figure}

\vspace*{-.5cm}
On termination, the algorithm computes what are called the $\mathcal{L}$-similarity classes of the input model $\gM$. Intuitively, if two elements in the model belong to the same $\mathcal{L}$-similarity class, then $\mathcal{L}$ is not expressive enough to tell them apart (i.e., no formula in $\mathcal{L}$ can distinguish them). All the objects in Figure~\ref{Tuna-scene} are distinguishable, but if, for instance, color and position are not considered then $e_2$ and $e_7$ are indistinguishable and, hence, will remain in the same similarity class when the algorithm terminates. 

The algorithm we discuss uses formulas of the $\el$ description logic language~\cite{baad:desc03} to describe refinement classes\footnote{Notice, though, that the particular formal language used is independent of the main algorithm, and different add$_{\mathcal{L}}$(R,$\varphi$,\RE) functions can be used depending on the language involved.}. For a detailed description of $\el$, we refer to~\cite{baad:desc03}.  
The interpretation of the $\el$ formula $\exists$\emph{green}.$\top$ is the set of all the green elements of the model. In Figure~\ref{Tuna-scene}, $\interp{\exists$\emph{green}.$\top}=\{e_1,e_2\}$. The interpretation of $\psi \sqcap \varphi$ is the set of all elements that satisfy $\psi$ and $\varphi$. In Figure~\ref{Tuna-scene}, $\interp{\exists$\emph{green}.$\top \sqcap \exists$\emph{chair}.$\top}=\{e_2\}$.

Now that we have an intuitive understanding of $\el$, we are ready to describe Algorithms~\ref{algo:bisim-l} and~\ref{algo:bisim-add-el-over}. 

\begin{figure}[!t]
\small
\centering
\begin{algorithm}[H]
\dontprintsemicolon
\caption{Computing $\mathcal{L}$-similarity classes}\label{algo:bisim-l}
\KwIn{\footnotesize A model $\gM$ and a list Rs $\in (\REL \times [0,1])^*$
 of relation symbols with their \puse\ values, ordered by \puse}
\KwOut{\footnotesize A set of formulas \RE such that
$\{\interp{\varphi} \mid \varphi \in \RE\}$ is the set of
$\mathcal{L}$-similarity classes of $\gM$}

$\RE \leftarrow \{\top\}$\tcp*[f]{\footnotesize the most general description $\top$ applies to all elements in the scene}

\For{\em (R,R.\puse) $\in$ Rs}{
	R.\randomuse = Random(0,1)\tcp*[f]{\footnotesize R.\randomuse is the probability of using R} 
        R.\incuse = (1 $-$ R.\puse) / MaxIterations
}

\Repeat{\em $\forall$((R,R.\puse) $\in$ Rs).(R.\puse $\ge$ 1)\tcp*[f]{\footnotesize R.\puse\ are incremented until 1}}{
  \While(\tcp*[f]{\footnotesize while some class has at least two elements}){\em $\exists (\varphi \in$ \RE)$.(\#\interp{\varphi}>1)$}{
      \RE' $\leftarrow$ \RE \tcp*[f]{\footnotesize make a copy for future comparison} \\
      \For{\em (R, R.\puse) $\in$ Rs}{
          \If(\tcp*[f]{\footnotesize R will be used in the expression}){\em R.\randomuse $\le$ R.\puse}{
              \lFor{\em $\varphi \in$ \RE}{
                  add$_\mathcal{EL}$(R, $\varphi$, \RE)\tcp*[f]{\footnotesize refine classes using R}}
                  }\
              \If(\tcp*[f]{\footnotesize the classification has changed}){\em \RE $\not =$ \RE'}{exit\tcp*[f]{\footnotesize exit for-loop to try again highest R.\puse}}
              }
     \If(\tcp*[f]{\footnotesize the classification has stabilized}){\em \RE $=$ \RE'}{exit\tcp*[f]{\footnotesize exit while-loop to increase R.\puse}}
  }
  \lFor{\em (R,R.\puse) $\in$ Rs}{
    R.\puse $\leftarrow$ R.\puse $+$ R.\incuse\tcp*[f]{\footnotesize increase R.\puse}
  }
}
\end{algorithm}

\begin{algorithm}[H]
\dontprintsemicolon
\caption{add$_\el$(R, $\varphi$, \RE)} \label{algo:bisim-add-el-over}

\If(\tcp*[f]{\footnotesize are we in the first loop?}){\em FirstLoop?}{
    Informative $\leftarrow$ TRUE \tcp*[f]{\footnotesize allow overspecification}}
\lElse(\tcp*[f]{\footnotesize informative: smaller than the original?}) {Informative $\leftarrow$ $\interp{\psi \sqcap \exists \mbox{\em R}.\varphi} \neq \interp{\psi}$} 
\For{\em $\psi \in$ \RE with $\#\interp{\psi} > 1$}{
  \If{\em $\psi \sqcap \exists$R.$\varphi$ is not subsumed in \RE\ {\bf and} \tcp*[f]{\footnotesize non-redundant: can't be obtained from \RE?}\\
    \em \ \ \ $\interp{\psi \sqcap \exists \mbox{\em R}.\varphi} \neq \emptyset$ {\bf and} \tcp*[f]{\footnotesize non-trivial: has elements?}\\
     \ \ \  \emph{Informative}}{
    add $\psi \sqcap \exists \mbox{R}.\varphi$ to $\RE$ \tcp*[f]{\footnotesize add the new class to the classification}
    remove subsumed formulas from $\RE$ \tcp*[f]{\footnotesize remove redundant classes}
  }
}
\end{algorithm}
\vspace*{-.5cm}\caption{Refinement algorithm with probabilities for the \el-language}\label{fig:algo3}
\end{figure}

Algorithm~\ref{algo:bisim-l} takes as input a model and a list Rs of pairs (R,R.\puse) that links each relation R $\in \REL$, the set of all relation symbols in the model\footnote{We represent each unary relation R as binary, hence 
 $\interp{\exists \emph{R}.\top}$ is the set of all elements in the model that have the property R},  to some probability of use R.\puse. For example, \emph{green} and \emph{large} are relations in the model of Figure~\ref{TUNA-scene-graph}.  
%I.e., if $\REL$ is the set of all relation symbols in the model then Rs $\in (\REL \times [0,1])^*$. We assume Rs to be ordered by R.\puse. 
%
The set $\RE$ contains the formal description of the refinement classes and it is initialized by the most general description $\top$. The formula $\top$ can be intuitively understood as the referring expression \emph{thing} or \emph{thingummy}.  
For each R, we first compute R.\randomuse, a random number in [0,1].  If R.\randomuse $\le$ R.\puse\ then R is used to refine the set of classes.  The value of R.\puse\ will be incremented by R.$\incuse$ in each main loop, to ensure that all relations are, at some point, considered by the algorithm.  This ensures that a referring expression will be found if it exists; but gives higher probability to expressions using relations with a high R.\puse. 
 %
While $\RE$ contains descriptions that can be refined (i.e., classes with at least two elements) the refinement function add$_\mathcal{L}$(R,$\varphi$,$\RE$) is called successively with each relation in Rs. If the model contains binary relations between its elements, a change in one of the classes, can trigger changes in others. For that reason, if $\RE$ changes, we exit the \textbf{for} loop to start again with the relations of higher R.\puse. If after trying to refine the set with all relations in Rs, the set $\RE$ has not changed, then we have reached a stable state (i.e., the classes described in $\RE$ cannot be further refined with the current R.\puse\ values). We will then increment all the R.\puse\ values and start the procedure again. 

Algorithm~\ref{algo:bisim-add-el-over} behaves as follows. The \textbf{for} loop refines each description in $\RE$ using the relation R and the other descriptions already in $\RE$, under certain conditions. The new description should be \emph{non-redundant} (it cannot be obtained from classes already in $\RE$), \emph{non-trivial} (it is not empty), and \emph{informative} (it does not coincide with the original class).  If these conditions are met, the new description is added to $\RE$, and redundant descriptions created by the new description are eliminated. The \textbf{if} statement at the beginning of Algorithm~\ref{algo:bisim-add-el-over} disregards the informativity test during the first loop of the algorithm allowing overspecification; without this condition the algorithm would generate minimal REs. For example, a minimal RE for $e_2$ is ``the green chair'' while an overspecified RE for this element is ``the green chair in the top row''.     

